ISO-TimeML Event Extraction in Persian Text
نویسندگان
چکیده
Recognizing TimeML events and identifying their attributes, are important tasks in natural language processing (NLP). Several NLP applications like question answering, information retrieval, summarization, and temporal information extraction need to have some knowledge about events of the input documents. Existing methods developed for this task are restricted to limited number of languages, and for many other languages including Persian, there has not been any effort yet. In this paper, we introduce two different approaches for automatic event recognition and classification in Persian. For this purpose, a corpus of events has been built based on a specific version of ISO-TimeML for Persian. We present the specification of this corpus together with the results of applying mentioned approaches to the corpus. Considering these methods are the first effort towards Persian event extraction, the results are comparable to that of successful methods in English. TITLE AND ABSTRACT IN PERSIAN اھداديور جارختسا زا یسراف نوتم فيرعت رب انب ISO-TimeML نتفاي اھداديور یگژيو و اھنآ یاھ ساسا رب TimeML زا یکي لئاسم هزوح رد مھم یعيبط یاھ نابز شزادرپ ی تسا . نابز شزادرپ یاھدربراک زا یرايسب هناماس دننام یعيبط یاھ و یزاس هص2خ ،تاع2طا جارختسا ،خساپ و شسرپ یاھ ات دنراد زاين ینامز تاع2طا جارختسا هرابرد یشناد یاھداديور رد دوجوم نوتم یدورو شور .دنشاب هتشاد هک یياھ نيا دروم رد نونکات هدش داجيا هلئسم نابز دنچ هب دودحم ، صاخ نابز زا یرايسب رد و تسا اھ هلمج زا ،یسراف نابز یراک نونکات هدشن ماجنا هطبار نيا رد یسراف نابز رد اھداديور جارختسا یارب فلتخم شور ود ام ،هلاقم نيا رد .تسا یم هئارا .ميھد یارب هرکيپ ،راک نيا اب قباطم یا ISO-TimeML ، سن هتبلا هخ دش هتخاس ،نآ یسراف صاخ ی ام . ناشن ار ،نآ یور رب لصاح جياتن و هرکيپ نيا تاصخشم یم ميھد شور جياتن . هئارا یاھ هدش هلاقم نيا رد ناونع هب ، شور نيلوا هدايپ یاھ اب ،یسراف نابز یور رب هدش یزاس .تسا هسياقم لباق یسيلگنا نابز رد قفوم یاھ شور
منابع مشابه
GATE-Time: Extraction of Temporal Expressions and Events
GATE is a widely used open-source solution for text processing with a large user community. It contains components for several natural language processing tasks. However, temporal information extraction functionality within GATE has been rather limited so far, despite being a prerequisite for many application scenarios in the areas of natural language processing and information retrieval. This ...
متن کاملTechniques d'apprentissage supervisé pour l'extraction d'événements TimeML en anglais et français
Identifying events from texts is an information extraction task necessary for many NLP applications. Through the TimeML specifications and TempEval challenges, it has received some attention in the last years, yet, no reference result is available for French. In this paper, we try to fill this gap by proposing several event extraction systems, combining for instance Conditional Random Fields, l...
متن کاملSupervised Machine Learning Techniques to Detect TimeML Events in French and English
Identifying events from texts is an information extraction task necessary for many NLP applications. Through the TimeML specifications and TempEval challenges, it has received some attention in recent years. However, no reference result is available for French. In this paper, we try to fill this gap by proposing several event extraction systems, combining for instance Conditional Random Fields,...
متن کاملAnnotating Lexically Entailed Subevents for Textual Inference Tasks
This paper presents a procedure for constructing an Event Structure Lexicon (ESL), a resource which represents the lexically-entailed subevents in text as a support for textual inference tasks. The ESL is used as a resource for a subevent markup algorithm, called SUBEVITA, which annotates event implicatures on top of TimeML-based extraction algorithms. Such a resource can be used independently ...
متن کاملRomanian TimeBank: An Annotated Parallel Corpus for Temporal Information
The paper describes the main steps for the construction, annotation and validation of the Romanian version of the TimeBank corpus. Starting from the English TimeBank corpus – the reference annotated corpus in the temporal domain, we have translated all the 183 English news texts into Romanian and mapped the English annotations onto Romanian, with a success rate of 96.53%. Based on ISO-Time the ...
متن کامل